improve: filter only own updates for read-after-write-consistency#3399
improve: filter only own updates for read-after-write-consistency#3399csviri wants to merge 18 commits into
Conversation
170c235 to
51d7087
Compare
413171e to
9cd3b73
Compare
|
@metacosm @shawkins @xstefank This approach however would be bulletproof if we handle the re-list in the informer from the fabric8 pr above. Since in some (rather extreme) edge cases, if there is a re-list, event might be lost. So the idea would be (will create a separate PR) that, if we got notified that goind to be a re-list we flip a switch, and until the sync is finished we simply allow to pass (don't filter) any events. I think without that this is not even possible to do 100% correctly. (but maybe @shawkins has some ideas) https://github.com/fabric8io/kubernetes-client/pull/7899/changes So this is the correct way of doing, with "maximal precision", the good think is that it is actually relatively simple.
|
444d683 to
d8682cc
Compare
|
added #3406 that covers the informer re-list, will continue now adding unit tests etc. |
3e986ab to
88c8e52
Compare
There was a problem hiding this comment.
Pull request overview
This PR updates the event-filtering/temporary-cache logic to better support read-after-write consistency by distinguishing “own” updates from external/intermediate updates, and adds a regression integration test covering deletion racing with a status update.
Changes:
- Reworks
TemporaryResourceCacheevent filtering to record “own” resourceVersions and return aGenericResourceEvent(viaOptional) instead of anEventHandlingenum. - Updates
ManagedInformerEventSource,InformerEventSource, andControllerEventSourceto consume the newOptional<GenericResourceEvent>flow and propagate summary events when appropriate. - Adds a new integration test scenario for deletion during status update, plus updates/extends unit tests around informer/controller event handling.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/deletionduringstatusupdate/DeletionDuringStatusUpdateStatus.java | Adds a status POJO used by the new deletion-vs-status-update regression test. |
| operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/deletionduringstatusupdate/DeletionDuringStatusUpdateReconciler.java | Test reconciler that forces a controlled race between status patching and deletion. |
| operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/deletionduringstatusupdate/DeletionDuringStatusUpdateIT.java | New regression IT ensuring cleanup is triggered when delete races with status update. |
| operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/deletionduringstatusupdate/DeletionDuringStatusUpdateCustomResource.java | New CR type used by the regression IT. |
| operator-framework-core/src/test/java/io/javaoperatorsdk/operator/processing/event/source/informer/TemporaryResourceCacheTest.java | Updates/extends cache tests for the new Optional<GenericResourceEvent> behavior and intermediate-event handling. |
| operator-framework-core/src/test/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerEventSourceTest.java | Updates informer event source tests; adds repeated tests and new intermediate-event scenarios. |
| operator-framework-core/src/test/java/io/javaoperatorsdk/operator/processing/event/source/controller/ControllerEventSourceTest.java | Adds controller-level tests for intermediate-event propagation/deferral during filtering. |
| operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/TemporaryResourceCache.java | Core behavior change: event filtering, own-version tracking, and intermediate-event propagation logic. |
| operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/ManagedInformerEventSource.java | Adapts update+cache flow to new doneEventFilterModify contract and event propagation. |
| operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerEventSource.java | Adapts informer callbacks to new Optional<GenericResourceEvent> and propagates the event payload. |
| operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/GenericResourceEvent.java | Introduces GenericResourceEvent payload used by the new filtering logic. |
| operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/EventFilterDetails.java | Refactors tracking to store related events and “own” resourceVersions to decide propagation/summary behavior. |
| operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/controller/ControllerEventSource.java | Updates controller event handling to use Optional<GenericResourceEvent> and the new delete handling path. |
Comments suppressed due to low confidence (2)
operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/GenericResourceEvent.java:31
- The identifier "lastStateUnknow" is misspelled (should be "lastStateUnknown"). Since it is exposed via a public getter, the typo leaks into call sites and makes the API harder to understand.
operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/GenericResourceEvent.java:72 - equals() ignores lastStateUnknow. If lastStateUnknow is semantically relevant (it changes delete semantics), equals/hashCode should be updated together to include it; otherwise consider documenting that it's intentionally excluded.
| // in this case we received and event that might be in some edge case that was | ||
| // already used in reconciler or after that, but before our updated resource version. |
| log.debug( | ||
| "Propagating event for {}, resource with same version not result of a reconciliation.", | ||
| "Propagating event for {}, resource with same version not result of a our update.", | ||
| action); |
| if (comp == 0) { | ||
| result = Optional.empty(); | ||
| } |
|
@shawkins was thinking about this more, and having optimistic locking would be a too strong restriction in my opinion, compared to that even not doing event filtering is a better option. So I prepared this PR and the one that we can merge subsequently: With these two we would cover this functionality by just filtering events directly related to our updates, or not do filtering when there is a relist going on. If the complexity would be proven to an issue and we would still hit some problematic cases, as an alternative we could just do a heuristic that would be very simple:
But for both alternatives, we at the end need to be aware that re-list happened and events might be lost, so regardless we would need this: please let me know what do you think. Will finalize the PRs today. cc @manusa |
|
Just one thought about the complexity, yes these algorithms are not trivial, but also would make it much easier for the users to implement reconciliation logic. So there is such complexity in the framework or at the end in some cases in the reconcilers, so having these problems solved by is where the frameworks are really useful. |
cc08d4d to
51ecd96
Compare
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
51ecd96 to
2deff06
Compare
| private final Boolean lastStateUnknow; | ||
|
|
||
| public ExtendedResourceEvent( | ||
| public GenericResourceEvent( |
There was a problem hiding this comment.
We shouldn't do this kind of renames of public classes in minors
| return; | ||
| } | ||
| if (resultEvent.orElseThrow().getAction() != ResourceAction.DELETED) { | ||
| log.warn("Non delete event received on onDelete handling. This should not happen."); |
There was a problem hiding this comment.
| log.warn("Non delete event received on onDelete handling. This should not happen."); | |
| log.warn("An event which is not a delete event recieved in onDelete handling. This should not happen."); |
There was a problem hiding this comment.
maybe we should also log the real event received
| // in this case we received and event that might be in some edge case that was | ||
| // already used in reconciler or after that, but before our updated resource version. |
Signed-off-by: Attila Mészáros a_meszaros@apple.com